Apache Spark 2 for Beginners by 2016
Author:2016
Language: eng
Format: epub, mobi
Publisher: Packt Publishing
Figure 13
In the preceding section, Spark DataFrames were created to get the datasets for the number of action movies and drama movies released over the period of the last 10 years. The data was collected into Python collection objects and line graphs were drawn in the same figure.
Python, in conjunction with the matplotlib library, is very rich in terms of methods to produce publication-quality charts and plots. Spark can be used as the workhorse for processing the data coming from heterogeneous sources of data, and the results can also be saved to a wide variety of data formats.
Those who are exposed to the Python data analysis library pandas will find it easy to understand the material covered in this chapter because Spark DataFrames designed from the ground up by taking inspiration from the R DataFrame as well as pandas.
This chapter has covered only a few sample charts and plots that can be created using the matplotlib library. The main idea of this chapter was to help the reader understand the capability of using this library in conjunction with Spark, where Spark is doing the data processing, and matplotlib is doing the charting and plotting.
The data file used in this chapter is read from a local filesystem. Instead of this, it can be read from HDFS or any other Spark-supported data source.
When using Spark as the primary framework for data processing, the most important point to keep in mind is that any possible data processing is to be done by Spark, mainly because Spark can do data processing in the best way. Only the processed data is to be returned to the Spark driver program for doing the charting and plotting.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Implementing Enterprise Observability for Success by Manisha Agrawal and Karun Krishnannair(7375)
Supercharging Productivity with Trello by Brittany Joiner(6633)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6422)
Mastering Tableau 2023 - Fourth Edition by Marleen Meier(6401)
Inkscape by Example by István Szép(6248)
Visualize Complex Processes with Microsoft Visio by David J Parker & Šenaj Lelić(5945)
Build Stunning Real-time VFX with Unreal Engine 5 by Hrishikesh Andurlekar(4943)
Design Made Easy with Inkscape by Christopher Rogers(4621)
Customizing Microsoft Teams by Gopi Kondameda(4156)
Linux Device Driver Development Cookbook by Rodolfo Giometti(3936)
Extending Microsoft Power Apps with Power Apps Component Framework by Danish Naglekar(3747)
Business Intelligence Career Master Plan by Eduardo Chavez & Danny Moncada(3722)
Salesforce Platform Enterprise Architecture - Fourth Edition by Andrew Fawcett(3625)
Pandas Cookbook by Theodore Petrou(3598)
The Tableau Workshop by Sumit Gupta Sylvester Pinto Shweta Sankhe-Savale JC Gillet and Kenneth Michael Cherven(3402)
TCP IP by Todd Lammle(2988)
Drawing Shortcuts: Developing Quick Drawing Skills Using Today's Technology by Leggitt Jim(2918)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2881)
Exploring Microsoft Excel's Hidden Treasures by David Ringstrom(2860)
